Hi there! Thanks for checking out my mini-project about Budapest Public Transport. The aim of this notebook is to visualize GTFS data specific to Budapest. The methods & functions used here are universal, and therefore, can be used on any other GTFS data.
General Transit Feed Specification (GTFS) is a widely used data specification framework that enables a common data format to be used by agencies to publish their transit data to be consumed by softwares. Budapest Közlekedési Központ (BKK), the transport agency of Budapest, publishes the transit data using the GTFS specification. GTFS has a rich metadata, therefore, it is easy to use and to implement for projects like this one.
In this project I attempt to enrich the GTFS data with geojson specifications of the different jurisdicial districts in Budapest. I will use the districts to create on the map. One could use this filter to know which vehicles enter/leave a specific district.
Furthermore, I will briefly introduce the libraries I used to create a map of routes in Budapest. Then, I will briefly discuss the functions to download, extract & enrich GTFS data. Then, I will show the transformations I added to the data. Lastly, the results section showcases the map produced.
There are briefly 3 categories of packages used for this project: one for data extraction, one for data manipulation & one for visualization.\ For data extraction I used 'requests', which allows for downloading specific content from a website. Then I used 'zipfile' to extract the content of the downloaded file. Moreover, basic commands are used from 'os' pkg.\ For data manipulation I used 'pandas', which is the de-facto package on python that uses sql-like transformations to manipulate data. Since I am working with geometrical shapes, points and lines, I used 'geopandas' and 'shapely' to work with geospatial data.\ For visualization, I used the 'keplergl' which creates beautiful maps from geospatial data.
# Import package
import pandas as pd
import geopandas as gp
from shapely.geometry import Point, LineString, shape
import gtfs_functions as gtfs
from keplergl import KeplerGl
import zipfile
import requests
import os
import warnings
warnings.simplefilter(action='ignore')
To download the data from a website, one could easily use the get function from the requests package. get_gtfs function uses the GET command to download and write the .zip file to a specified directory, after which it extracts it to the very same folder.
def get_gtfs(url, directory):
filename = url.split('/')[-1]
path = directory + filename
if os.path.exists(filename):
os.remove(filename)
r = requests.get(url, allow_redirects=True)
open(path, 'wb').write(r.content)
with zipfile.ZipFile(path, 'r') as zip_ref:
zip_ref.extractall(directory)
read_file function will parse the downloaded .txt files into geopandas dataframes enriched with geospatial data. If the stops.txt is being read, then the district shapes are added to the geopandas dataframe.
def read_file(dirname,file):
df = pd.read_csv(dirname + file)
geo_cols = sorted([col for col in df.columns if 'lon' in col or 'lat' in col], reverse=True)
if len(geo_cols) != 0 and file != 'routes.txt':
if file == 'stops.txt':
# enrich with district column
geojson = gp.read_file(dirname + 'BP_geojson.json')
df = pd.read_csv('./data/stops.txt')
df['district'] = None
for index, row in df.iterrows():
point = Point((row['stop_lon'], row['stop_lat'],))
inter_district = geojson[geojson.geometry.intersects(point)]['id']
if len(inter_district) != 0:
df.loc[index, 'district'] = inter_district.item()
cols = ['stop_id', 'district', 'stop_name', 'stop_lat', 'stop_lon', 'stop_code',
'location_type', 'parent_station', 'wheelchair_boarding',
'stop_direction']
df = df[cols]
# convert to geopandas dataframe
gdf = gp.GeoDataFrame(df, geometry=gp.points_from_xy(df[geo_cols[0]], df[geo_cols[1]]))
gdf.drop([geo_cols[0], geo_cols[1]], axis = 1, inplace = True)
if file == 'shapes.txt':
gdf2 = gdf.groupby(['shape_id'])['geometry'].apply(lambda x: LineString(x.tolist()))
gdf2 = gp.GeoDataFrame(gdf2, geometry='geometry')
return gdf2.reset_index()
else:
return gdf
else:
return df
# get gtfs data
url = 'https://bkk.hu/gtfs/budapest_gtfs.zip'
get_gtfs(url, './data/')
# Parse GTFS
routes = read_file('./data/', 'routes.txt')
stops = read_file('./data/', 'stops.txt')
stop_times = read_file('./data/', 'stop_times.txt')
trips = read_file('./data/', 'trips.txt')
shapes = read_file('./data/', 'shapes.txt')
# add additional columns to stop_times
stop_times = stop_times.merge(trips[['trip_id', 'route_id', 'shape_id', 'direction_id']], on='trip_id').merge(stops[['stop_id', 'district', 'stop_name', 'geometry']], on='stop_id')
The routes dataframe contains descriptive data about each line BKK operate. Merging it with shapes, trips & stop_times creates a large dataframe containing all relevant information about each route. It contains geospatial data for each line, the route name, its type and its color code attributed by BKK. Each route is duplicated across the districts it ventures into.
route_district_shapes = stop_times[['shape_id', 'district']].drop_duplicates().merge(shapes, on='shape_id')\
.merge(routes[['route_id','route_short_name','route_type', 'route_color']].merge(trips[['route_id', 'shape_id']].drop_duplicates(), on='route_id'), on='shape_id')
route_district_shapes['route_type'] = route_district_shapes['route_type'].replace({0: 'Tram', 1: 'Metro',
3: 'Bus', 4: 'Ferry',
109: 'Rail', 800: 'Trolleybus'})
route_district_shapes = gp.GeoDataFrame(route_district_shapes, geometry=route_district_shapes.geometry)
route_district_shapes.loc[route_district_shapes.route_type == 'Metro', 'route_type'] = route_district_shapes.loc[(route_district_shapes.route_type == 'Metro')]['route_short_name']
route_district_shapes.loc[route_district_shapes.route_type == 'Rail', 'route_type'] = route_district_shapes.loc[(route_district_shapes.route_type == 'Rail')]['route_short_name']
route_district_shapes.loc[route_district_shapes.route_color == '1E1E1E', 'route_type'] = 'Niqht Bus'
KeplerGL is, in my opinion, the best package for creating beautiful maps in Python. Check it out here!\ Using keplerGL, I mapped the linestring for each route to the coordinates of Budapest. One can filter on each district to see which routes enter/leave a specific district.
bp_map = KeplerGl(height=500)
bp_map.add_data(data=route_district_shapes, name="Route districts")
bp_map
User Guide: https://docs.kepler.gl/docs/keplergl-jupyter
I also used the HEX codes provided by BKK for each route to color the routes. To save the changes, just simply add the congifurations to the map via the code below. Then, you can save the map you created in an html.
config = bp_map.config
bp_map.save_to_html(file_name='bp.html')
Here are the results! An accurate representation of each route in Budapest. As it can be seen, Budapest highly depends on its bus lines (blue color), while trolleys (red) & trams (yellow) dominate the inner city.
